Skip to content

fix(kvcache): trace-faithful PositionalKVCache.update (#763)#193

Merged
michalharakal merged 1 commit into
developfrom
fix/763-kvcache-trace-fidelity
Jun 25, 2026
Merged

fix(kvcache): trace-faithful PositionalKVCache.update (#763)#193
michalharakal merged 1 commit into
developfrom
fix/763-kvcache-trace-fidelity

Conversation

@michalharakal

Copy link
Copy Markdown
Contributor

Summary

PositionalKVCache.update is not trace-faithful: it copies the incoming K/V tensor data into a heap FloatArray (writeAt) and reads it back via ctx.fromData (sliceView), bypassing ctx.ops. Under tracing the symbolic K/V carry no data, so toComputeGraph(embedConstants=true) bakes an all-zero KV buffer as stablehlo.constant and disconnects the computed k_proj/v_proj — the exported decoder attends over K=V=0.

Eager inference is fine (the buffer holds real data). In export the bug was masked in plain decoders (the unnormalized FFN dominates the residual stream) and exposed by Gemma's sandwichNorms (post_ffw_norm normalizes the FFN, so the lost attention becomes significant → ~1.4×/block logit error). Tracked as SKaiNET#763.

Fix

When ctx.isRecording, wire K/V functionally through ops.concat (the same history AppendKVCache already uses) instead of the raw heap buffer, so the StableHLO export carries the real projections. The eager fast-path (heap buffer) is unchanged. Side effect: the traced graph no longer surfaces dangling KV-cache buffer leaves as graph outputs.

Verification

End-to-end via skainet-iree-conformance gemma-decoder — a real 2-block Gemma3 decoder with qk-norm + sandwichNorms + layer-output-scale: export → iree-compile → iree-run-module matches the SKaiNET-CPU oracle at max_abs_err 3.8e-6 (was ~5.3). Adds KvCacheTraceFidelityTest (asserts a traced sandwich decoder bakes no zero KV-cache constants).

Note: the other PositionalKVCache-derived shared variants (SharedPositionalKVCache, PaddedSharedPositionalKVCache, OwnerReadOnlyKVCache) still use the buffer sliceView on read and would need the same treatment for kv-shared exports — out of scope here (gemma-decoder uses the non-shared cache).

🤖 Generated with Claude Code

PositionalKVCache.update copied the incoming K/V tensor *data* into a heap FloatArray
(writeAt) and read it back via ctx.fromData (sliceView), bypassing ctx.ops. Under tracing
the symbolic K/V carry no data, so toComputeGraph(embedConstants=true) baked an all-zero
KV buffer as stablehlo.constant and disconnected the computed k_proj/v_proj — the exported
decoder then attended over K=V=0. In eager inference the buffer holds real data, so the bug
was invisible there; in export it was masked in plain decoders by the unnormalized FFN
dominating the residual stream, and exposed by Gemma's sandwichNorms (post_ffw_norm
normalizes the FFN, so the lost attention becomes significant — ~1.4x/block logit error).

Fix: when ctx.isRecording, wire K/V functionally through ops.concat (the same history
AppendKVCache already uses) instead of the raw heap buffer, so the StableHLO export carries
the real projections. The eager fast-path (heap buffer) is unchanged. As a side effect the
traced graph no longer surfaces dangling KV-cache buffer leaves as graph outputs.

Verified end-to-end via skainet-iree-conformance gemma-decoder (real 2-block Gemma3 with
qk-norm + sandwichNorms + layer-output-scale): export -> iree-compile -> iree-run-module
matches the SKaiNET-CPU oracle at max_abs_err 3.8e-6 (was ~5.3). Adds KvCacheTraceFidelityTest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit d9a0518 into develop Jun 25, 2026
0 of 4 checks passed
@michalharakal michalharakal deleted the fix/763-kvcache-trace-fidelity branch June 25, 2026 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant